## [1] 1599 12
## 'data.frame': 1599 obs. of 12 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.01200 Min. : 1.00 Min. : 6.00
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00
## Median :0.07900 Median :14.00 Median : 38.00
## Mean :0.08747 Mean :15.87 Mean : 46.47
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00
## Max. :0.61100 Max. :72.00 Max. :289.00
## density pH sulphates alcohol
## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40
## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50
## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20
## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42
## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10
## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90
## quality
## Min. :3.000
## 1st Qu.:5.000
## Median :6.000
## Mean :5.636
## 3rd Qu.:6.000
## Max. :8.000
Most of wines quality graded as 5
Alcohol content of wine ranges from 9 to 14, but some are over 14 and some less than 9
As we see in the above plot pH of wine is between 3 and 4, most of wines contain pH > 3 and < 3.5
Maximum and minumum pH
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.210 3.310 3.311 3.400 4.010
Above plot shows sulphates contents in wines , most wines contain sulphates between 0.5 and 1
Above plot shows normall distribution of density of wines.
Aas we see in the above plot most wines in this datset contain low volatile acidity.
Fixed acidity plot is right skewed with some outliers which means most wines contains fixed acidity greater than the median. the medain is 7.9 # Univariate Analysis
There are 1599 wines in the dataset with 12 features (fixed.acidity, volatile.acidity, citric.acid, residual.sugar, chlorides, free.sulfur.dioxide, total.sulfur.dioxide, density,pH, sulphates, alcohol and quality). The variable quality is ordered factor variables with the following levels.
(worst) -> (best)
quality: score between 0 and 10,this dataset contains only from 3 to 8.
Other observations:
The median quality score is 6.
Most of wine contain 3.3 pH.
The median chlorides is 0.012 and the max is 0.611.
The main features in the data set is the quality grade of wine, I?d like to determine which features are best for predicting the quality of a diamond. I suspect alcohol and some combination of the other variables can be used to build a predictive model to quality grade of wine. ### What other features in the dataset do you think will help support your
I think alcohol, pH and residual sugar
No ### Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form the data?
No
From the above chart, free.sulfur, residual.sugar and pH do not seem to have strong correlations with quality, most correlated variable with quality is alcohol, also density is the most variable correlated with alcohol. I want to look closer at plots involving quality and some other variables like alcohol and density.
Box plot for each quality grade score, the median of sulphates increases when quality of wine increased.
The above plot represent the relationship between quality and alcohol,high quality wine contain high alcohol content.
The above scatter plot represents negative correlation, density decreases when alcohol is increased.
Tha chart of correlation matrix here shows the correlation coefficient of pH and fixed.acidity which is -0.68 that’s mean there is a strong negative correlation between them, The above scatter plot shows this correlation, pH increases when fixed.acidity decreased.
# Bivariate Analysis
Quality correlates with alcohol and sulphates, alcohol correlates with density.
Yes there is a strong relationship between alcohol and density. ### What was the strongest relationship you found? Quality correlates strongly with alcohol, correlation coefficient of them is 0.5 and negative correlation between density and alcohol with -0.5 correlation coefficient.
# Multivariate Plots Section
Dark blue points that repesent Worst quality are on section of high density and low alcohol.
As the plot shows higher quality have low volatile acidity and high alcohol.
The above plot shows strong correlation between free.sulfur.dioxide and total.sulfur.dioxide by alcohol precentage,free.sulfur.dioxide increases when total.sulfur.dioxide increased.
The most of wine quality score is 5 and 6.
The above heat map plot shows the best quality grade score comes with high alcohol precentage
Worst quality score comes with high density and low alcohol precentage.
That’s my first R project, I didn’t use R programming language before, I have chosen red wine data set which contains 1599 observations and 12 features. it tooks me more than 10 hours to explore it was not difficult it’s so enjoyable. When I learn R more I will go back to it and explore it more.